[cuebot] Add memory stranded cores metric#2394
Conversation
Add three new prometheus metrics. cue_cores_total, cue_cores_idle cue_cores_memory_stranded. These metrics are intended to evaluate how many cores of the farm are being wasted per allocation due to not having enough memory to pick up more frames.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR introduces per-allocation stranded core statistics that are collected from the database and exposed as Prometheus gauge metrics. The changes add a new data model, DAO query, service layer, metrics collection logic, Spring wiring, and test coverage to measure memory-constrained cores across allocations. ChangesMemory-stranded Core Metrics Collection
Sequence Diagram(s)sequenceDiagram
participant PrometheusMetricsCollector
participant HostManager
participant HostManagerService
participant HostDaoJdbc
participant Database
PrometheusMetricsCollector->>HostManager: getStrandedCoreStats()
HostManager->>HostManagerService: delegate getStrandedCoreStats()
HostManagerService->>HostDaoJdbc: getStrandedCoreStats()
HostDaoJdbc->>Database: execute GET_STRANDED_CORE_STATS SQL
Database-->>HostDaoJdbc: result rows
HostDaoJdbc-->>HostManagerService: List<StrandedCoreStats>
HostManagerService-->>HostManager: List<StrandedCoreStats>
HostManager-->>PrometheusMetricsCollector: List<StrandedCoreStats>
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cuebot/src/main/java/com/imageworks/spcue/PrometheusMetricsCollector.java`:
- Around line 281-284: The stranded-core gauges (coresTotal, coresIdle,
coresMemoryStranded) are being cleared before calling
hostManager.getStrandedCoreStats(), so if that DAO call fails we drop all
series; change the flow to first call hostManager.getStrandedCoreStats() and
capture the result (handle exceptions), and only clear and repopulate the three
gauges after the read succeeds (e.g. if stats list is non-null/returned without
exception), using the same variables coresTotal, coresIdle, coresMemoryStranded
to update values; this preserves existing series on DAO failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ab649a14-dd59-4b14-8c44-a1ca05dca291
📒 Files selected for processing (8)
cuebot/src/main/java/com/imageworks/spcue/PrometheusMetricsCollector.javacuebot/src/main/java/com/imageworks/spcue/StrandedCoreStats.javacuebot/src/main/java/com/imageworks/spcue/dao/HostDao.javacuebot/src/main/java/com/imageworks/spcue/dao/postgres/HostDaoJdbc.javacuebot/src/main/java/com/imageworks/spcue/service/HostManager.javacuebot/src/main/java/com/imageworks/spcue/service/HostManagerService.javacuebot/src/main/resources/conf/spring/applicationContext-service.xmlcuebot/src/test/java/com/imageworks/spcue/test/dao/postgres/HostDaoTests.java
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Diego Tavares <dtavares@imageworks.com>
There was a problem hiding this comment.
Thanks, @DiegoTavares
Approved with 2 minor changes required/suggested.
- Missing the Apache license header in two files.
Co-authored-by: Ramon Figueiredo <ramon.fgrd@gmail.com> Signed-off-by: Diego Tavares <dtavares@imageworks.com>
c8618f6
into
AcademySoftwareFoundation:master
Add three new prometheus metrics. cue_cores_total, cue_cores_idle cue_cores_memory_stranded. These metrics are intended to evaluate how many cores of the farm are being wasted per allocation due to not having enough memory to pick up more frames.
Summary by CodeRabbit